AITopics | visual place recognition

Collaborating Authors

visual place recognition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Implicit Aggregation: Robust Image Representation for Place Recognition in the Transformer Era

Neural Information Processing SystemsJun-18-2026, 19:47:08 GMT

Visual place recognition (VPR) is typically regarded as a specific image retrieval task, whose core lies in representing images as global descriptors. Over the past decade, dominant VPR methods (e.g., NetVLAD) have followed a paradigm that first extracts the patch features/tokens of the input image using a backbone, and then aggregates these patch features into a global descriptor via an aggregator. This backbone-plus-aggregator paradigm has achieved overwhelming dominance in the CNN era and remains widely used in transformer-based models. In this paper, however, we argue that a dedicated aggregator is not necessary in the transformer era, that is, we can obtain robust global descriptors only with the backbone. Specifically, we introduce some learnable aggregation tokens, which are prepended to the patch tokens before a particular transformer block. All these tokens will be jointly processed and interact globally via the intrinsic self-attention mechanism, implicitly aggregating useful information within the patch tokens to the aggregation tokens.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.67)
Transportation > Ground > Rail (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

MutualVPR: AMutual Learning Framework for Resolving Supervision Inconsistencies via Adaptive Clustering

Neural Information Processing SystemsJun-14-2026, 09:31:02 GMT

Visual Place Recognition (VPR) enables robust localization through image retrieval based on learned descriptors. However, drastic appearance variations of images at the same place caused by viewpoint changes can lead to inconsistent supervision signals, thereby degrading descriptor learning. Existing methods either rely on manually defined cropping rules or labeled data for view differentiation, but they suffer from two major limitations: (1) reliance on labels or handcrafted rules restricts generalization capability; (2) even within the same view direction, occlusions can introduce feature ambiguity. To address these issues, we propose MutualVPR, a mutual learning framework that integrates unsupervised view self-classification and descriptor learning. We first group images by geographic coordinates, then iteratively refine the clusters using K-means to dynamically assign place categories without orientation labels. Specifically, we adopt a DINOv2-based encoder to initialize the clustering.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(3 more...)

Add feedback

EMVP: Embracing Visual Foundation Model for Visual Place Recognition with Centroid-Free Probing

Neural Information Processing SystemsMar-22-2026, 16:37:43 GMT

Visual Place Recognition (VPR) is essential for mobile robots as it enables them to retrieve images from a database closest to their current location. The progress of Visual Foundation Models (VFMs) has significantly advanced VPR by capturing representative descriptors in images. However, existing fine-tuning efforts for VFMs often overlook the crucial role of probing in effectively adapting these descriptors for improved image representation. In this paper, we propose the Centroid-Free Probing (CFP) stage, making novel use of second-order features for more effective use of descriptors from VFMs. Moreover, to control the preservation of task-specific information adaptively based on the context of the VPR, we introduce the Dynamic Power Normalization (DPN) module in both the recalibration and CFP stages, forming a novel Parameter Efficiency Fine-Tuning (PEFT) pipeline (EMVP) tailored for the VPR task. Extensive experiments demonstrate the superiority of the proposed CFP over existing probing methods. Moreover, the EMVP pipeline can further enhance fine-tuning performance in terms of accuracy and efficiency.

artificial intelligence, name change, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Robots (0.59)

Add feedback

Multiview Scene Graph

Neural Information Processing SystemsMar-18-2026, 21:36:32 GMT

A proper scene representation is central to the pursuit of spatial intelligence where agents can robustly reconstruct and efficiently understand 3D scenes. A scene representation is either metric, such as landmark maps in 3D reconstruction, 3D bounding boxes in object detection, or voxel grids in occupancy prediction, or topological, such as pose graphs with loop closures in SLAM or visibility graphs in SfM. In this work, we propose to build Multiview Scene Graphs (MSG) from unposed images, representing a scene topologically with interconnected place and object nodes. The task of building MSG is challenging for existing representation learning methods since it needs to jointly address both visual place recognition, object detection, and object association from images with limited fields of view and potentially large viewpoint changes. To evaluate any method tackling this task, we developed an MSG dataset and annotation based on a public 3D dataset. We also propose an evaluation metric based on the intersection-over-union score of MSG edges. Moreover, we develop a novel baseline method built on mainstream pretrained vision models, combining visual place recognition and object association into one Transformer decoder architecture. Experiments demonstrate that our method has superior performance compared to existing relevant baselines.

artificial intelligence, name change, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.83)

Add feedback

1fac855f8225e0b9cdb904a1e0118fdc-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 06:32:52 GMT

We also propose an evaluation metric based on the intersection-over-union score of MSG edges.

justification, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

0b135d408253205ba501d55c6539bfc7-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 17:04:48 GMT

dataset, descriptor, supervlad, (16 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology (0.46)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Add feedback

Adaptive Thresholding for Visual Place Recognition using Negative Gaussian Mixture Statistics

Trinh, Nick, Lyons, Damian

arXiv.org Artificial IntelligenceDec-11-2025

Visual place recognition (VPR) is an important component technology for camera-based mapping and navigation applications. This is a challenging problem because images of the same place may appear quite different for reasons including seasonal changes, weather illumination, structural changes to the environment, as well as transient pedestrian or vehicle traffic. Papers focusing on generating image descriptors for VPR report their results using metrics such as recall@K and ROC curves. However, for a robot implementation, determining which matches are sufficiently good is often reduced to a manually set threshold. And it is difficult to manually select a threshold that will work for a variety of visual scenarios. This paper addresses the problem of automatically selecting a threshold for VPR by looking at the 'negative' Gaussian mixture statistics for a place - image statistics indicating not this place. We show that this approach can be used to select thresholds that work well for a variety of image databases and image descriptors.

artificial intelligence, machine learning, threshold, (14 more...)

arXiv.org Artificial Intelligence

2512.09071

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

EMVP: Embracing Visual Foundation Model for Visual Place Recognition with Centroid-Free Probing

Neural Information Processing SystemsNov-20-2025, 05:11:28 GMT

Specifically, it achieves 93.9%, 96.5%, and 94.6% Recall@1 on the MSLS V alidation, Pitts250k-test, and SPED datasets, respectively, while saving 64.3% of trainable parameters compared with the existing SOT A PEFT method.

descriptor, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Europe > Czechia > Prague (0.04)
Asia > China > Hong Kong (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Going Places: Place Recognition in Artificial and Natural Systems

Milford, Michael, Fischer, Tobias

arXiv.org Artificial IntelligenceNov-19-2025

Place recognition--the process of an animal, person or robot recognizing a familiar location in the world--has attracted significant attention across multiple disciplines. In animals, this capability has evolved over millions of years through sophisticated neural mechanisms: hippocampal place cells fire at specific spatial locations (1), entorhinal grid cells provide spatial coordinates through hexagonal firing patterns (2), while diverse species demonstrate remarkable navigation--from desert ants using celestial cues and visual panoramas (3) to migratory birds returning to precise breeding sites across hemispheric distances (4). Humans extend these biological foundations with unique cognitive abilities, recognizing places not only through sensory perception but also through semantic meaning, emotional associations, and cultural context--enabling us to identify familiar locations from descriptions, memories, or even fictional narratives (5). In artificial systems, place recognition underpins core robotics functions such as localization, mapping, and long-term autonomy, developing into a mature field that, while sometimes inspired by biological principles, often diverges significantly in implementation to optimize for computational efficiency and metric accuracy. As research has grown in the area, so too has a rich landscape of surveys and reviews that reflect the field's evolution and diversification.

machine learning, natural language, recognition, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1146/annurev-control-032724-014418

2511.14341

Country: North America > United States (0.93)

Genre:

Research Report (0.81)
Overview (0.66)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
(4 more...)

Add feedback

Scaling Image Geo-Localization to Continent Level

Lindenberger, Philipp, Sarlin, Paul-Edouard, Hosang, Jan, Balice, Matteo, Pollefeys, Marc, Lynen, Simon, Trulls, Eduard

arXiv.org Artificial IntelligenceOct-31-2025

Determining the precise geographic location of an image at a global scale remains an unsolved challenge. Standard image retrieval techniques are inefficient due to the sheer volume of images (>100M) and fail when coverage is insufficient. Scalable solutions, however, involve a trade-off: global classification typically yields coarse results (10+ kilometers), while cross-view retrieval between ground and aerial imagery suffers from a domain gap and has been primarily studied on smaller regions. This paper introduces a hybrid approach that achieves fine-grained geo-localization across a large geographic expanse the size of a continent. We leverage a proxy classification task during training to learn rich feature representations that implicitly encode precise location information. We combine these learned prototypes with embeddings of aerial imagery to increase robustness to the sparsity of ground-level data. This enables direct, fine-grained retrieval over areas spanning multiple countries. Our extensive evaluation demonstrates that our approach can localize within 200m more than 68\% of queries of a dataset covering a large part of Europe. The code is publicly available at https://scaling-geoloc.github.io.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.26795

Country:

Europe (1.00)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback